AMENDMENT TO THE CLAIMS 
L (Currently Amended) A method of compressing a log of Im^^^m^ fenatura] language data, 
the log having a plurality of lixiRuiGticn atural language help query strings, each string including at 
least two tokens, the method comprising: 

applying a compression operation to each stri ng, wherein each string is a quei-y 

relative to a help function of a computer system. : 
identifying two strings that match each other ajfter the compression operation; and 
removing one of the two matching strings from the lo g to form a compressed log : 

XXijXX 

feeding the compressed log back to the compression operation at least once: and 
training a statistical process with the compressed log, 

2. (Previously Presented) The method of claim 1, wherein the log is a log of user-initiated 
inputs to a help interface. 

3. (Canceled) 

4. (Canceled) 

5. (Original) The method of claim 1, wherein the compression operation is character-based. 

6. (Original) The method of claim 1, wherein the compression operation is token-based. 

7. (Original) The method of claim 1 , wherein the compression operation is subsumption. 

8. (Original) The method of claim 7, wherein subsumption includes applying an impossibility 
condition to selectively compute edit distance. 



9, (Original) The method of claim 1 , and further comprising: 



applying a second compression operation to each string; 

deteraxining if any two strings match each other after the second compression 

operation; and 
removing one of the two matching strings from the log. 



10. (Original) The method of claim % wherein the first compression operation is character-based 
and the second compression operation is token based. 

IL (Original) The method of claim 10, and further comprising applying subsumption after the 
second compression operation is complete. 

12. (Original) The method of claim 11, wherein the subsumption operation is repeated for the 
log, 

13. (Canceled) 

14. (Currently Amended) A system for compressing a query log having a plurality of 
linguisticn atural lan.guage help query strings, each string having a plurality of tokens, the system 

comprising: 



an input for receiving a raw query lo g of natural lmp^}.^L^e help query strings 

relative to a help function of a computer : 
memory for storing the raw query log; 

a processor for applying at least one compression operation to each string, and for 
scanning the modified strings to determine if any match each other so that 
one of the matching strings can be remove d to form a compressed query 
log; and 



wherein the proc^sor is configured to feed the com pressed query tog back to the 
compression, operation at least once and to utilize the compressed query 
log to train a statistical process once the removal is complete. 

15. (Previously Presented) The system of claim 14, wherein each string is a query relative to a 
help function. 

16. (Previously Presented) The system of claim 15, wherein each help-related query is 
relative to a computer system. 

17. (Original) The system of claim 14, wherein the at least one compression operation is 
character-based, 

18. (Original) The system of claim 14, wherein the at least one compression operation is token- 
based, 

19. (Original) The system of claim 14, wherem the at least one compression operation includes 
subsumption. 

20. (Original) The system of claim 19, wherein subsumption includes applying an impossibility 
condition to selectively compute edit distance. 

21. (Original) The system of claim 14, and further comprising: 

applying at least a second compression operation to each string; 

determining if any two strings match each other ajfter the second compression 

operation; and 
removing one of the two matching strings from the log. 
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22. (Original) The system of claim 21, wherein the first compression operation is character- 
based and the second compression operation is token based. 

23. (Original) The system of claim 22, and further comprising applying subsumption after the 
second compression operation is complete. 

24. (Original) The s>«tem of claim 23, wherein the subsumption operation is repeated for the 
log. 



25. (Canceled) 



